This forum is closed to new posts and
responses. Individual names altered for privacy purposes. The information contained in this website is provided for informational purposes only and should not be construed as a forum for customer support requests. Any customer support requests should be directed to the official HCL customer support channels below:
~Julia Fezveluburjip 17.Dec.03 03:03 PM a Web browser Domino Server6.0.2 CF2Solaris
We are running a Domino cluster on two machines, with the ICM running on it own separate partitioned server on one of the machines. Since upgrading to Domino 6 & Solaris 9 the ICM regularly keels over and dies (well, it stops responding and the partioned Domino server refuses connections). To add to the agony, running NSD -kill usually fails to identify the processes attached to this server instance and I have to resort to using pkill -9 -u to stop it.
I have, along with resident Solaris experts, spent many hours trying to track down the problem. I have discovered the following (altho' I dont know if it's related):
1. The fault recovery log for the ICM server is full of entries that read "[13] ERROR: Message queue failure: REMOVE ITEM" at a frequency of about three a minute. These generally start within about 30 minutes, or much less, of the other Domino server on the same machine starting.
2. The other partitioned server on the same machine writes "[11] ERROR: Process being removed not in queue: xxxx" to the fault recovery log when it shuts down.
3. ipcs reveals that the message queue for the ICM server has vanished when the errors are appearing in its fault recovery log.
4. The ICM server regularly complains that another Domino process is sharing log.nsf, which it isn't as far as I can see.
We have the rlim_fd_max set at 65536 and msgtql at 1024 (and have tried 2048!).